Effect of Hunting on Red Deer

P15.2 Fortgeschrittenes Praxisprojekt

Nikolai German, Thomas Witzani, Ziqi Xu, Zhengchen Yuan, Baisu Zhou

Dr. Nicolas Ferry - Bavarian National Forest Park / Daniel Schlichting - StabLab

31 Jan 2025

Agenda

  1. The Background
  1. The Data
  1. The Models
  1. The Wrap-up

Motivation

  • Hunting activities have a numerical effect on animal populations
  • Additionally, hunting can have non-lethal effects
  • Goal: assess short-term stress response in red deer towards hunting events at the Bavarian Forest National Park

Data-Generating Process

  • A deer roams freely in the Bavarian Forest National Park
  • Its movement is tracked by a GPS collar
  • A hunting event happens
  • After some time, the deer defecates. The defecation event
  • Subsequently, Researchers go to the defecation location and collect a fecal sample

FCMs as a Measure of Stress

  • Faecal Cortisol Metabolites (FCM) are substances found in feces of animals
  • The FCM level is used to measure previous stress. Higher Stress \(\Rightarrow\) Higher FCM level
  • Stress \(\Rightarrow\) secretion of certain hormones \(\Rightarrow\) gut retention \(\Rightarrow\) FCM
  • Gut retention time \(\approx\) 19 hours
  • Once defecated, FCM levels decay over time

Huber et al (2003)

Research Questions

  • What is the effect of temporal and spatial distance on FCM levels?
  • Does the time between defecation event and sample collection effect FCM levels?

Approach

  • Model FCM levels - amongst other covariates - on spatial and temporal distance to hunting activities

  • Expectations:

    • FCM levels higher when closer in time and space
    • FCM levels lower, the more time passes between defecating and sampling

Agenda

  1. The Background
  1. The Data
  1. The Models
  1. The Wrap-up

The Datasets

  • FCM Data
  • Hunting Events
  • Movement Data

FCM Data

Contains information of 809 faecal samples, including:

  • the FCM level [ng/g],
  • the time and location of sampling,
  • to which deer the sample belongs,
  • when the defecation happened.

Samples where taken at irregular time intervals from 2020 to 2022.

Hunting Events

  • Contains location and time of \(\geq\) 700 hunting events from 2020 to 2022.
  • 519 hunting events have complete location and time information.

Movement Data

  • Contains the location of the 40 collared deer from Feb. 2020 to Feb. 2023.
  • Movement is tracked at hourly intervals.

Limited Data, Large Uncertainty

  • Hunting events are single points in time and space.
  • Deer locations at hourly intervals \(\Rightarrow\) exact distances unknown \(\Rightarrow\) approximate needed, large uncertainty!
  • Each deer only encountered few hunting events.

Distance Approximation

Deer location at the time of hunting event is approximated by linear interpolation:

Relevant Hunting Events

A hunting event is considered relevant to a faecal sample, if

  • the time difference between hunting and defecation is between the gut retention time (GRT) thresholds, and
  • the distance between the deer and the hunting event is \(\leq\) distance threshold.

Time difference Distance 19 hours distance threshold GRT highthreshold GRT lowthreshold Deer Hunting events

The Most Relevant Hunting Event

Among the relevant hunting events, the most relevant one is defined by one the three proximity criteria:

  • closest in time (to the GRT target of 19 hours),
  • nearest (smallest spatial distance),
  • highest score.

Time difference Distance 19 hours distance threshold GRT highthreshold GRT lowthreshold Deer Hunting events Nearest Highestscore Closest in time(to 19 hours)

The Scoring Function

we define the Scoring function as following:

\[ S(d, t) \propto \begin{cases} \frac{1}{d^2} \cdot f_\textbf{t}(t), t \sim \mathcal{N}(\mu, \sigma^2) &|t \leq \mu \\ \frac{1}{d^2} \cdot f_\textbf{t}(t), t \sim \mathcal{Laplace}(\mu, b) &|t > \mu \end{cases} \] where:

\[ \begin{align*} d & \text{: Distance } \\ t & \text{: Time Difference } \\ \mu & \text{: GRT target = 19 hours } \end{align*} \]

The Scoring Function

The marginal effects of distance and elapsed time since challenge on the score:

The Fused Data

Finish Datasets

We suggest three different Datasets for Modelling

DataSet GRT low GRT high Distance Threshold Proximity Criterion Deers Observations
1 0 36 10 closest in time 35 149
2 0 36 10 nearest 35 147
3 0 200 15 score 36 223

Agenda

  1. The Background
  1. The Data
  1. The Models
  1. The Wrap-up

The Models

For Modelling, we consider the following covariates, defined for each pair of FCM sample and most relevant hunting event:

  • Time difference
  • Distance
  • Sample delay
  • Defecation day (day of year as integer)
  • Number of other relevant hunting events

The Models

We chose two different approaches to Modelling:

  1. Machine Learning: a model, which focuses on prediction, in our case a XGBoost Model
  2. Statistical Modelling: a model, which helps to understand the effects of our covariables, here a General Additive Mixed Model

A. XGBoost Setup

  1. Split Data: We divide each dataset into training and testing sets. (75% - 25%)
  2. Set Hyperparameter Grid: We define a range of values for starting hyperparameters.
  3. Optimize Hyperparameters: We then perform a grid search to evaluate different hyperparameter combinations -> Iteratively readjust the grid based on test RMSE until convergence. Uses CV
  4. Train Final Model on Full Data: We then use the optimized parameters to train on the entire dataset. We prevent overfitting by optimal n_rounds to keep Test-RMSE low.
  5. Aggregate Results: Run the pipeline 40 times with different seeds. Use the average RMSE and average predictions to evaluate the overall performance.

We do this seperately for all 3 datasets (nearest, closest and score).

A. XGBoost Result

Model Objective Evaluation Metric Max Depth Eta Gamma Subsample Colsample Bytree Min Child Weight Mean RMSE SD RMSE Number of Observations
last reg:squarederror rmse 4 0.1635 5.850 0.5918 0.9921 4.640 168.6336 24.40957 149
nearest reg:squarederror rmse 4 0.1661 5.893 0.5956 0.9832 4.747 151.3186 17.91780 147
score reg:squarederror rmse 5 0.1744 5.834 0.6063 1.0000 4.766 147.9845 16.50250 223

B. Generalized Additive Mixed Model

  • Family: Gamma

  • Log link for interpretability

  • Let \(i = 1,\dots,N\) be the indices of deer and \(j = 1,\dots,n_i\) be the indices of faecal samples for each deer

\[ \begin{eqnarray} \textup{FCM}_{ij} &\overset{\mathrm{iid}}{\sim}& \mathcal{Ga}\left( \nu, \frac{\nu}{\mu_{ij}} \right) \quad\text{for}\; j = 1,\dots,n_i, \\ \mu_{ij} &=& \mathbb{E}(\textup{FCM}_{ij}) = \exp(\eta_{ij}), \\ \eta_{ij} &=& \beta_0 + \\ && \beta_1 \cdot \textup{number of other relevant hunting events}_{ij} + \\ && f_1(\textup{time difference}_{ij}) + f_2(\textup{distance}_{ij}) + \\ && f_3(\textup{sample delay}_{ij}) + f_4(\textup{defecation day}_{ij}) + \\ && \gamma_{i}, \\ \gamma_i &\overset{\mathrm{iid}}{\sim}& \mathcal{N}(0, \sigma_\gamma^2) \quad\text{for} i = 1,\dots,N \end{eqnarray} \]

\(f_1, f_2, f_3, f_4\) are penalised cubic regression splines.

B Generalized Additive Mixed Model

Closest in time

Dataset Term Estimate Std_Error
Closest in Time (Intercept) 5.824 0.053
Closest in Time NumOtherHunts -0.137 0.061

B Generalized Additive Mixed Model

Nearest

Dataset Term Estimate Std_Error
Nearest (Intercept) 5.812 0.054
Nearest NumOtherHunts -0.103 0.060

B Generalized Additive Mixed Model

Highest score

Dataset Term Estimate Std_Error
Highest Score (Intercept) 5.905 0.081
Highest Score NumOtherHunts -0.016 0.014

B Results

Category Subcategory Description
Diagnostics QQ Plot Residuals mostly follow expected distribution
Diagnostics Residuals vs Predictor No major pattern
Diagnostics Histogram Reasonable fit, some variance
Diagnostics Observed vs Fitted Moderate spread, some unexplained variance
Random Effects Time & Space Effects Weak or inconsistent
Random Effects Sample Delay Shows some effect
Linear Effects other hunting events No significant impact

Agenda

  1. The Background
  1. The Data
  1. The Models
  1. The Wrap-up

Conclusion

  • Due to the high uncertainties, we were not able to detect a relevant effect of spatial or temporal distance on FCM levels.
  • In some of the cases we were able to prove the expected decay of FCM levels with prolonged time between defecation event and sample collection.
  • With more datapoints, the uncertainty will likely shrink.

Discussion

  • How to minimize spatial and temporal distance at the same time?

  • How to use a bigger Part of the Data?

Appendix

GAMM Marginal Predictions (REML / closest in time)

GAMM Marginal Predictions (REML / nearest)

GAMM Marginal Predictions (REML / highest score)

GAMM Marginal Predictions (GCV / closest in time)

GAMM Marginal Predictions (GCV / nearest)

GAMM Marginal Predictions (GCV / highest score)